AMENDMENTS TO THE CLAIMS 

The following listing of claims will replace all prior versions and listings of claims 
in the application. 

Listing Of Claims 

1 . (original) A media capture device, comprising: 
a media capture mechanism; 

an audio input receptive of user speech relating to a media capture activity 
in close temporal relation to the media capture activity; 

a plurality of focused speech recognition lexica respectively relating to 
media capture activities; 

a speech recognizer adapted to recognize the user speech based on a 
selected one of the focused speech recognition lexica; 

a media tagger adapted to tag captured media with text generated by said 
speech recognizer based on close temporal relation between receipt of recognized user 
speech and capture of the captured media; and 

a media annotator adapted to annotate the captured media with a sample 
of the user speech that is suitable for input to a speech recognizer based on close 
temporal relation between receipt of the user speech and capture of the captured 
media. 

2. (original) The device of claim 1 , further comprising an input receptive of a 
user identity, wherein said speech recognizer is adapted to recognize user speech 
based on the user identity. 
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3. (original) The device of claim 2, wherein said speech recognizer is 
adapted to employ focused lexica based on the user identity. 

4. (original) The device of claim 1 , wherein said speech recognizer is 
adapted to select a lexicon based on the user speech and a predefined heuristic relating 
to voice tags associated with the lexica. 

5. (original) The device of claim 1 , further comprising a user interface 
adapted to permit a user to navigate between and select a lexicon. 

6. (original) The device of claim 1 , further comprising a media retrieval 
mechanism adapted to retrieve captured media from memory of the device by matching 
a tag of the captured media to recognition text generated form user speech received 
and recognized during a retrieval mode of the device. 

7. (original) The device of claim 1 , further comprising a media retrieval 
mechanism adapted to retrieve captured media from memory of the device by matching 
an annotation of the captured media to user speech received during a retrieval mode of 
the device using sound similarity metrics to align an annotation with a spoken query. 
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8. (original) The device of claim 1 , further comprising a lexicon editor 
adapted to supplement a lexicon based on an annotation, letter to sound rules, and user 
speech corresponding to spelled word input received and recognized during a lexicon 
edit mode of the device. 

9. (original) The device of claim 1 , further comprising an external data 
interface adapted to transmit annotations to a post processor having greater speech 
recognition capabilities than said device. 

1 0. (original) The device of claim 1 , further comprising: 

an external data interface receptive of lexicon contents; and 

a lexicon editor adapted to store the lexicon contents in device memory. 

1 1 . (currently amended) A media tagging system, comprising: 

a portable media capture device adapted to capture media, to receive user 
speech in close temporal relation to a media capture activity, and adapted to annotate 
captured media with a sample of the user speech that is suitable for input to a speech 
recognizer based on close temporal relation between receipt of the user speech and 
capture of the captured media; and 

a post processor adapted to receive annotations from the device, perform 
speech recognition on the annotations based on a plurality of focused speech 
recognition lexica that respectively relate to media capture activities , and tag related 
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captured media with text generated during speech recognition performed on the 
annotations. 

12. (original) The system of claim 1 1 , comprising a source of predefined, 
focused lexica relating to media capture activities and adapted to communicate focused 
lexica to said media capture device according to device type over a communications 
network. 

13. (original) The system of claim 1 1 , comprising a source of predefined, 
focused lexica relating to media capture activities and adapted to communicate focused 
lexica to said post-processor over a communications network. 

14. (original) The system of claim 1 1 , comprising a lexicon editor provided to 
at least one of the device and the post processor and adapted to customize a focused 
lexicon for a user of the device. 

15. (original) The system of claim 1 1 , comprising a mapping module adapted 
to convert textual tags associated with captured media to alternative textual tags based 
on predetermined criteria relating to a media capture activity. 

16. (original) The system of claim 1 1 , wherein said device is adapted to 
perform a relatively limited amount of speech recognition on the annotation compared to 
an amount of speech recognition performed by said post-processor, the relatively 
limited amount being limited in at least one of time and search space due to at least one 
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of relatively lower processing power and relatively lower memory capacity of said 
device, and to tag related captured media with recognition text generated during the 
relatively limited amount of speech recognition. 

17. (original) The system of claim 1 1 , wherein said post-processor is receptive 
of captured media from said device, and is adapted to organize the captured media 
according to at least one of annotations and textual tags associated with the captured 
media, including clustering at least one of annotations and textual tags based on at 
least one of acoustic similarity measures and semantic similarity measures. 

18. (original) A media tagging method for use with a media capture device, 
comprising: 

capturing media with the media capture device during a media capture 
activity conducted by a user of the device; 

receiving user speech via an audio input of the device in close temporal 
relation to the media capture activity; 

annotating captured media by storing the captured media in memory of 
the device in association with a sample of the user speech that is suitable for input to a 
speech recognizer; 

recognizing the user speech with a speech recognizer of the device 
employing a focused speech recognition lexicon relating to the media capture activity; 
and 
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tagging captured media with recognition text generated during recognition 
of the user speech by storing the captured media in memory of the device in association 
with the recognition text. 



19. (original) The method of claim 18, further comprising selecting a focused 
speech recognition lexicon relating to the media capture activity from a plurality of 
focused lexica relating to media capture activities that are stored in memory of the 
device. 

20. (original) The method of claim 1 9, wherein said step of selecting the 
focused speech recognition lexicon is based on the user speech and a predefined 
heuristic relating to voice tags associated with the lexica. 

21 . (original) The method of claim 1 9, wherein said step of selecting the 
focused speech recognition lexicon is based on user navigation of the lexica via a user 
interface of the device. 

22. (original) The method of claim 1 8, further comprising receiving a user 
identity, wherein said step of recognizing the user speech is based on the user identity. 
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23. (original) The method of claim 22, further comprising selecting, based on 
the user identity, a focused speech recognition lexicon relating to the media capture 
activity from a plurality of focused lexica relating to media capture activities that are 
stored in memory of the device. 

24. (original) The method of claim 18, further comprising retrieving captured 
media from memory of the device by matching a tag of the captured media to 
recognition text generated form user speech received and recognized during a retrieval 
mode of the device. 

25. (original) The method of claim 18, further comprising retrieving captured 
media from memory of the device by matching an annotation of the captured media to 
user speech received during a retrieval mode of the device using sound similarity 
metrics to align an annotation with a spoken query. 

26. (original) The method of claim 1 8, further comprising supplementing a 
lexicon stored in device memory based on an annotation, letter to sound rules, and user 
speech corresponding to spelled word input received and recognized during a lexicon 
edit mode of the device. 

27. (original) The method of claim 18, further comprising receiving lexicon 
contents and storing the lexicon contents in device memory. 
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28. (original) The method of claim 18, further comprising transferring 
annotations from the device to a post processor having greater speech recognition 
capability than the device. 

29. (original) The method of claim 28, further comprising: 

performing speech recognition on annotations received from the device; 

and 

tagging related captured media with text generated during speech 
recognition performed on the annotations. 

30. (original) The method of claim 28, comprising transferring focused lexica 
from a source of predefined, focused lexica to the post processor. 

31 . (original) The method of claim 18, comprising transferring focused lexica 
from a source of predefined, focused lexica to the device. 

32. (original) The method of claim 18, comprising customizing a focused 
lexicon for a user of the device. 

33. (original) The method of claim 18, comprising convert textual tags 
associated with captured media to alternative textual tags based on predetermined 
criteria relating to a media capture activity. 
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34. (original) The method of claim 18, further comprising organizing the 
captured media according to textual tags associated with the captured media, including 
clustering textual tags based on semantic similarity measures. 

35. (original) The method of claim 18, further comprising organizing the 
captured media according to annotations associated with the captured media, including 
clustering annotations based on acoustic similarity measures 
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